Design and Implementation of Focused Web Crawler Using Genetic Algorithm: An Approach to Web Mining
نویسندگان
چکیده
The speed at which World -Wide -Web (WWW) is growing round the clock spreds its arms from smaler collections of web pages to a massive hub of web information which gradually increases the complexity of crawling process.search engines handles enourmous quaries from different part of the univers to retrieve most of the relevant results in response to answer the user queries, and it is solely depends on knowledge that it gathers by means of crawling. To tackle this issue the Focused web crawlers are emerging. The crawler is kept focused to the user interests toward the topic, thus crawling processes should be optimum.to make optimum crawling one should use available optimization techniques. This paper proposes a web carawler using genetic algorithm. For selecting more truthfull and proper web pages by web crawler the genetic algorithm as optimization technique has been used. It uses similarity measures which is use to determine the relevancy of the web pages.The results showed that our approach displays with higher quality expected result than traditional focused crawling techniques.
منابع مشابه
Prioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملA Technique for Improving Web Mining using Enhanced Genetic Algorithm
World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...
متن کاملQoS-Based web service composition based on genetic algorithm
Quality of service (QoS) is an important issue in the design and management of web service composition. QoS in web services consists of various non-functional factors, such as execution cost, execution time, availability, successful execution rate, and security. In recent years, the number of available web services has proliferated, and then offered the same services increasingly. The same web ...
متن کاملFocused Crawling using Asynchronous Cellular Learning Automata
Web crawling is used to collect the web pages which will be indexed by a search engine. The search engine uses these crawled and indexed pages to answer users’ queries. Since the volume of web pages is very high and it increases continuously, search engines can index a limited number of web pages. Therefore, in recent years, the focused crawler algorithms have been introduced which act selectiv...
متن کاملA Framework for Deep Web Crawler Using Genetic Algorithm
The Web has become one of the largest and most readily accessible repositories of human knowledge. The traditional search engines index only surface Web whose pages are easily found. The focus has now been moved to invisible Web or hidden Web, which consists of a large warehouse of useful data such as images, sounds, presentations and many other types of media. To use such data, there is a need...
متن کامل